perm filename PATREC[4,KMC]2 blob sn#078951 filedate 1973-12-22 generic text, type T, neo UTF8
00100	AN  ALGORITHM WHICH CHARACTERIZES NATURAL LANGUAGE
00200	          DIALOGUE EXPRESSIONS
00300	
00400	
00500	
00600		COLBY AND PARKISON
00700	
00800	OUTLINE
00900		INTRODUCTORY -Discussion of language as code, other approaches
01000		              sentence versus word dictionary using projection
01100	                      rules to yield an interpretation from word definitions.
01200		              experience with old Parry.
01300		PROBLEMS -dialogue problems and methods. Constraints. Special cases.
01400		Preprocessing- dict words only
01500		              translations
01600		              contractions
01700			      expansions
01800			      synonyms
01900			      negation
02000		Segmenting - prepositions, wh-words, meta-verbs
02100			    give list
02200		Matching - simple and compound patterns
02300			  association with semantic functions
02400			  first coarsening - drop fillers- give list
02500			  second coarsening - drop one word at a time
02600			  dangers of insertion and restoration
02700			Recycle condition- sometimes a pattern containing pronouns
02800			is matched, like "DO YOU AVOID THEM".  If THEM could be
02900			a number of different things and Parry's answer depends on
03000			which one it is, then the current value of the anaphora,
03100			THEM, is substituted for THEM and the resulting pattern
03200			is looked up.  Hopefully, this will produce a match to a
03300			more specific pattern, like "DO YOU AVOID MAFIA".
03400			  default condition - pass surface to memory
03500			                      change topic or level
03600		Advantages - real-time performance, pragmatic adequacy and
03700			           effectiveness, performance measures.
03800			    "learning" by adding patterns
03900			    PARRY1 ignored word order- penalty too great
04000			    PARRY1 too sequential taking first pattern it found
04100			       rather than looking at whole input and then deciding.
04200			   PARRY1 had patterns strung out throughout procedures
04300			       and thus cumbersome for programmer to see what patterns were.
04400		Limitations - typical failures, possible remedies
04500		Summary
04600	
04700	
04800		By  "characterize"  we  are  referring  to   a   process,   a
04900	multi-stage  sequence  of  functions,  which progressively transforms
05000	natural language input expressions into a  pattern  which  eventually
05100	best matches a stored pattern whose name has a pointer to the name of
05200	a response function. Response functions decide what to  do  once  the
05300	input  has  been  characterized.      Here  we shall discuss only the
05400	characterizing functions, except for one response function (anaphoric
05500	substitution) which interactively aids the characterization process.
05600		In  constructing  and  testing  a  simulation   of   paranoid
05700	pocesses,  we  were  faced  with  the problem of reproducing paranoid
05800	linguistic behavior in  a  diagnostic  psychiatric  interview.    The
05900	diagnosis   of  paranoid  states,  reactions  or  modes  is  made  by
06000	clinicians who judge a degree of  correspondence  between  what  they
06100	observe  linguistically in an interview and their conceptual model of
06200	paranoid behavior. There exists a high degree of agreement about this
06300	conceptual  model which relies mainly on what an interviewee says and
06400	how he says it.
06500		Natural language is a life-expressing  code  people  use  for
06600	communication  with  themselves  and others.  In a real-life dialogue
06700	such as a psychiatric interview,  the  participants  have  interests,
06800	intentions,  and  expectations which are revealed in their linguistic
06900	expressions.    To produce effects on an interviewer which  he  would
07000	judge  similar  to  the  effects  produced by a paranoid patient , an
07100	interactive  simulation  of  a  paranoid  patient  must  be  able  to
07200	demonstrate  typical  paranoid  interview behavior. Thus is must have
07300	the ability to deal with the linguistic behavior of  the  interviewer
07400	adequate to achieve the desired effects.
07500		There are a  number  of  approaches  one  might  consider  in
07600	handling   dialogue   expressions.       One  approach  would  be  to
07700	construct a dictionary of all expressions which could possibly  arise
07800	in  an  interview.  Associated  with  each  expression  would  be its
07900	interpretation depending on dialogue context.  For  obvious  reasons,
08000	no one takes this approach  seriously.    Instead  of  an  expression
08100	dictionary,  one  might  construct  a  word  dictionary  and then use
08200	projection rules to yield an interpretation of a  sentence  from  the
08300	dictionary  definitions.  This, for example, has been the approach of
08400	Winograd [ ] and Woods [ ]. Such a method performs adequately as long
08500	as the dictionary involves only a few hundred words, each word having
08600	only one or two senses, and the dialogue is limited to  a  mini-world
08700	of only a few objects and relations.  But the problems which arise in
08800	a psychiatric interview conducted in  unrestricted  English  are  too
08900	great for this method to be useful in real-time dialogues requiring a
09000	waiting time of less than 10 seconds..
09100		Little  is  known  about how humans process natural language.
09200	They can be shown to possess some knowledge of grammar rules but this
09300	does not entail that they use a grammar in interpreting and producing
09400	language. Irregular verb-tenses and noun-plurals do not follow rules;
09500	yet  people use thousands of them. One school of linguistics believes
09600	that people possess full  transformational  grammars  for  processing
09700	language.  In  our  view  this  position  seems  dubious.  Originally
09800	transformational grammars were not designed to "understand"  a  large
09900	subset  of  English;  they  represented  a set of axioms for deciding
10000	whether a string is "grammatical". Efforts  to  use  them  for  other
10100	purposes have not been fruitful.
10200		An analysis  of  what  the  problem  is  guides  one  to  the
10300	selection  or  invention  of methods appropriate to its solution. Our
10400	problem was not to develop a consistent  theory  of  language  or  to
10500	assert empirical  hypotheses  about  how people process language. Our
10600	problem was to characterize what is being said in a dialogue and what
10700	is being said about it in order to make a response such that a sample
10800	of I-O pairs from the paranoid model is judged similar to a sample of
10900	I-O  pairs  from  paranoid  patients.  We are not making an existence
11000	claim that our strategy represents the way people  process  language.
11100	We  sought  an  efficacious method which could operate efficiently in
11200	real time. Its relation to methods humans use is, by way of  analogy,
11300	a  workable  possibility,  i.e.  "something like this" might occur in
11400	people.
11500		For  our problem, managing the communicative uses and effects
11600	of natural language, we adopted a strategy of transforming the  input
11700	until  a  pattern  is  achieved which matches to some degree a stored
11800	pattern.  This strategy is adequate for our purposes  a  satisfactory
11900	percentage  of  the  time.    (No  one  expects  an  algorithm  to be
12000	successful 100% of the time since not even humans, the  best  natural
12100	language  system  around,  achieve  this  level of performance).  The
12200	power of this method for  natural  language  dialogues  lies  in  its
12300	ability  to ignore unrecognizable expressions and irrelevant details.
12400	A conventional parser  doing  word-by-word  analysis  fails  when  it
12500	cannot  find  one or more of the input words in its dictionary.   Its
12600	weakness is that it must know; it cannot guess.
12700		In early versions of the paranoid model,  (PARRY1),  many  of
12800	the pattern recognition mechanisms were weak because they allowed the
12900	elements of the pattern  to  be  order  independent.    For  example,
13000	consider the following expressions:
13100		(1) WHERE DO YOU WORK?
13200		(2) WHAT SORT OF WORK DO YOU DO ?
13300		(3) WHAT IS YOUR OCCUPATION ?
13400		(4) WHAT DO YOU DO FOR A LIVING ?
13500		(5) WHERE ARE YOU EMPLOYED ?
13600	In PARRY1 a procedure would scan these  expressions  looking  for  an
13700	information-bearing  contentive  such as "work", "for a living", etc.
13800	If it found such a contentive along with a "you"  or  "your"  in  the
13900	expression,  regardless  of  word  order,  it  would  respond  to the
14000	expression as if it were a question about the nature of  one's  work.
14100	(There  is  some  doubt  this  even  qualifies  as  a  pattern  since
14200	interrelations  between  words are ignored and only their presence is
14300	considered).    An insensitivity to word order has the advantage that
14400	lexical  items  representing  different parts of speech can represent
14500	the same concept,e.g.  "work" as noun or as verb. But we  found  from
14600	experience that, since English relies heavily on word order to convey
14700	the meaning of it messages, the  mean  penalty  of  errors,  was  too
14800	great.  Hence  in  PARRY2  ,  as will be described in detail, all the
14900	patterns require a specified word order.
15000		It is a truism for high-complexity problems that it is useful
15100	to  have  constraints.     Diagnostic  psychiatric  interviews   (and
15200	especially  those  conducted  over  teletypes)  have  several natural
15300	constraints.  First, clinicians are trained to ask certain  questions
15400	in  certain  ways. These stereotypes can be treated as special cases.
15500	Second, only  a  few  hundred  standard  topics  are  brought  up  by
15600	interviewers   who  are  trained  to  use  everyday  expressions  and
15700	especially those used by the patient himself. When the  interview  is
15800	conducted  over teletypes, expressions tend to be shortened since the
15900	interviewer tries to increase the information transmission rate  over
16000	the  slow  channel of a teletype.  (It is said that short expressions
16100	are more grammatical but think  about  the  phrase  "Now  now,  there
16200	there.")  Finally,  teletyped  interviews  represent  written speech.
16300	These expressions are full of idioms, cliches, pat phrases,  etc.   -
16400	all being easy prey for a pattern recognition approach.  It is futile
16500	to try to decode an idiom by analyzing the meanings of its individual
16600	words.  One knows what an idiom refers to or one does not.
16700		We shall describe the alogorithm in three sections devoted to
16800	preprocessing, segmenting, matching and recycling.
16900	
17000			PREPROCESSING
17100	
17200		Each word in the input expression is first  looked  up  in  a
17300	dictionary  of 1240 words. The dictionary consists of a list of words
17400	and other words they can be translated into. (SHOW PIECE OF DICT?) If
17500	a  word in the input is not in the dictionary, it is dropped from the
17600	pattern being formed. Thus if the input were:
17700		WHAT  IS  YOUR  CURRENT OCCUPATION?
17800	and  the word "current" is not in the dictionary, the pattern at this
17900	phase becomes:
18000		(  WHAT IS YOUR OCCUPATION )
18100	The question-mark is thrown away since questions  are  recognized  by
18200	word  order. (A statement followed by a question mark ( YOU GAMBLE? )
18300	is considered to be communicatively equivalent in its effects to that
18400	statement  followed by a period.) Synonymic translations of words are
18500	made so that the pattern becomes, for example:
18600		(  WHAT  BE  YOU  JOB )
18700	Groups of words are translated  into  a  single  word  so  that,  for
18800	example, "for a living" becomes "job".
18900		Certain  juxtaposed  words  are  made into a single word,e.g.
19000	"GET  ALONG  WITH"  becomes  "GETALONGWITH". This is done (1) to deal
19100	with groups of words which are represented as  single  words  in  the
19200	stored  pattern and (2) to prevent segmentation from occurring at the
19300	wrong places, such as at a  preposition  inside  an  idiom.   Besides
19400	these  contractions, certain expansions are made so that for example,
19500	"DON'T" becomes "DO NOT" and "I'D" becomes "I WOULD".
19600	
19700			SEGMENTING
19800	
19900		Borrowing  a heuristic from machine-translation work by Wilks
20000	[ ] and  supported  by  evidence  from  psycholinguistic  experiments
20100	indicating that humans recognize spoken sentences a phrase at a time,
20200	we devised a way of bracketing the pattern  constructed  up  to  this
20300	point  into  shorter  segments using the list of words in Fig.1.  The
20400	new pattern formed is termed either "simple",  having  no  delimiters
20500	within  it,  or  "compound",  i.e.being made up of two or more simple
20600	patterns.  A simple pattern might be:
20700		( WHAT BE YOU JOB )
20800	whereas a compound pattern would be:
20900		(( WHY BE YOU ) ( IN HOSPITAL ))
21000	Our  experience  with this method of segmentation shows that compound
21100	patterns are rarely more than three or four fragments.
21200		After certain verbs ("THINK", "FEEL",etc) a bracketing occurs
21300	to replace the commonly omitted "THAT", such that:
21400		( I THINK YOU BE AFRAID )
21500	becomes
21600		(( I THINK ) ( YOU BE AFRAID ))
21700	
21800			PREPARATION FOR MATCHING
21900	
22000		Conjunctions serve only as markers for the segmenter and they
22100	are dropped out after segmentation.
22200		Negations  are  handled  by  extracting  the  "NOT"  from the
22300	pattern and assigning a value to a global variable which indicates to
22400	the algorithm that the expression is negative in form. When a pattern
22500	is finally matched, this variable is consulted. Some patterns have  a
22600	pointer  to  a  pattern  of opposite meaning if a "NOT" could reverse
22700	their meanings.  If this pointer is present and  a  "NOT"  is  found,
22800	then the pattern matched is replaced by its opposite. (Roger- need good example).
22900	
23000			MATCHING AND RECYCLING
23100	
23200		The  algorithm  now  attempts to match the segmented patterns
23300	with stored patterns which are currently 1024  in  number.   First  a
23400	complete  and  perfect  match  is sought.  When a match is found, the
23500	stored pattern name has a pointer to the name of a response  function
23600	which  decides  what to do further.  If a match is not found, further
23700	transformations of the pattern are carried out and a "fuzzy" match is
23800	tried.
23900		For fuzzy matching at this stage, the contentive words in the
24000	pattern are dropped one at a time and a match  attempted  each  time.
24100	This  allows  ignoring  familiar  words in unfamiliar contexts.   For
24200	example, "well" is important in "Are you well?"  but  meaningless  in
24300	"Well are you?".
24400		Deleting one word at a time results in, for example, the pattern:
24500			(what be you main problem )
24600	becoming successively:
24700			(a) ( be you main problem )
24800			(b) ( what you main problem )
24900			(c) ( what be main problem )
25000			(d) ( what be you problem )
25100			(e) ( what be you main )
25200	Since the stored pattern, in this case, matches (d), (e) would not be
25201	constructed. We found it unwise to delete more than  one  word  since
25202	out  segmentation  method  yields  segments  containing a small (1-4)
25203	number of words.
25600		The  transformations  described above result in a progressive
25700	coarsening of the patterns by deletion. Substitutions are  also  made
25800	in  certain  cases.  Some patterns contain pronouns which could stand
25900	for a number of different  things  of  importaance  to  PARRY2.   The
26000	pattern:
26100		(DO YOU AVOID THEM)
26200	could refer to the Mafia, or racetracks,  or  other  patients.   When
26300	such  a pattern is recognized, the pronoun is replaced by its current
26400	anaphoric value, and a more specific pattern such as:
26500		(DO YOU AVOID MAFIA)
26600	is  looked  up.  In many cases, the meaning of a pattern containing a
26700	pronoun is clear without any substitution.  In the pattern:
26800		((HOW DO THEY TREAT YOU) (IN HOSPITAL))
26900	the meaning of THEY is clarified by (IN HOSPITAL).
27000	
27100			COMPLEX-PATTERN MATCH
27200	
27300		When more than one simple pattern is detected in the input, a
27400	second matching is attempted.  The methods used are  similar  to  the
27500	first matching.  Certain patterns, such as (HELLO) and (I THINK), are
27600	dropped because they are meaningless.  If a  complete  match  is  not
27700	found,  then  simple  patterns  are  dropped, one at a time, from the
27800	complex pattern. This allows the input,
27900		((HOW DO YOU COME) (TO BE) (IN HOSPITAL))
28000	to  match  the  stored  pattern,
28100		((HOW  DO  YOU COME) (IN HOSPITAL)).
28200	
28300		If  no  match  can  be found at this point, the algorithm has
28400	arrived at a default condition and the appropriate response functions
28500	decide  what  to  do.  For example, in a default condition, the model
28600	may assume  control  of  the  interview,  asking  the  interviewer  a
28700	question, continuing with the topic under discussion or introducing a
28800	new topic.
28900	
29000		ADVANTAGES AND LIMITATIONS
29100	
29200		As   mentioned,   one   of   the   main   advantages   of   a
29300	characterization  strategy  is  that  it  can ignore what it does NOT
29400	recognize.   There  are at least  415,000  words  in  English,   each
29500	possessing  one  to one hundred senses. To construct a machine-usable
29600	dictionary of this magnitude is out of the question at this time.   A
29700	characterization  of  natural language input such as described above,
29800	allows real-time interaction in a dialogue since it  avoids  becoming
29900	ensnarled  in  "understanding"  and  metainterpretations  of language
30000	which would slow down a dialogue to impracticality, if it could  even
30100	occur at all.
30200		A drawback to PARRY1 was that it reacted to the first pattern
30300	it  found  in the input rather than characterizing the input as fully
30400	as possible and then deciding what to do based on a number of  tests.
30500	Another   practical   difficulty  with  PARRY1  from  a  programmer's
30600	viewpoint,  was  that  the  patterns  were  strung  out  in   various
30700	procedures  throughout  the  algorithm.   It was often a considerable
30800	chore for the programmer to determine whether  a  given  pattern  was
30900	present  and  precisely  where it was. In the above-described method,
31000	the patterns are all collected in one part  of  the  data-base  where
31100	they can easily be examined.
31200		Concentrating  all the patterns in the data base gives PARRY2
31300	a limited "learning" ability.  When  an  input  fails  to  match  any
31400	stored  pattern  or  matches  an  incorrect one, as judged by a human
31500	operator, a pattern matching the input can be put into the data  base
31600	automatically.  If  the  new  pattern  has  the  same  meaning  as  a
31700	previously stored pattern, the human operator must provide  the  name
31800	of  the  appropriate  response  function.  If he doesn't remember the
31900	name, he may try to rephrase the input  in  a  form  recognizable  to
32000	PARRY2  and  it  will  name the response function associated with the
32100	rephrasing.  These mechanisms are not "learning" in the commonly used
32200	sense  but  they  do  allow  a  person to transfer his knowledge into
32300	PARRY2's data base with very little redundant effort.
32400		We have a number of performance measures on  PARRY1  along  a
32500	number  of  dimensions including "linguistic non-comprehension". That
32600	is, judges estimated PARRY1's abilities along this dimension on a 0-9
32700	scale.   They  also  rated  human  patients and a "random" version of
32800	PARRY1 in this manner.( GIVE BAR-GRAPH HERE AND  DISCUSS).   We  have
32900	collected  ratings of PARRY2 along this dimension to determine if the
33000	characterization  process  represents  an  improvement  over  PARRY1.
33100	(FRANK AND KEN EXPERIMENT).